{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QM9 Dataset exploration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The purpose of this notebook is as follows,\n",
    "\n",
    " - Explain [QM9 dataset](http://quantum-machine.org/datasets/): Check the labels and visualization of molecules to understand what kind of data are stored.\n",
    " - Explain internal structure of QM9 dataset in `chainer_chemistry`: We handle the dataset with `NumpyTupleDataset`.\n",
    " - Explain how `preprocessor` and `parser` work on `chainer_chemistry`: One concrete example using `GGNNPreprocessor` is explained.\n",
    "\n",
    "It is out of scope of this notebook to explain how to train graph convolutional network using this dataset, please refer [document tutorial](http://chainer-chemistry.readthedocs.io/en/latest/tutorial.html#) or try `train_qm9.py` in [QM9 example](https://github.com/pfnet-research/chainer-chemistry/tree/master/examples/qm9) for the model training.\n",
    "\n",
    "[Note]\n",
    "This notebook is executed on 1, March, 2018.\n",
    "The behavior of QM9 dataset in `chainer_chemistry` might change in the future."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Loading modules and set loglevel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import logging\n",
    "from rdkit import RDLogger\n",
    "from chainer_chemistry import datasets\n",
    "\n",
    "# Disable errors by RDKit occurred in preprocessing QM9 dataset.\n",
    "lg = RDLogger.logger()\n",
    "lg.setLevel(RDLogger.CRITICAL)\n",
    "\n",
    "# show INFO level log from chainer chemistry\n",
    "logging.basicConfig(level=logging.INFO)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "QM9 dataset can be downloaded automatically by chainer chemistry. \n",
    "Original format of QM9 dataset is zipped file where each molecule's information is stored in each \"xyz\" file.\n",
    "\n",
    "Chainer Chemistry automatically merge these information in one csv file internally, you may check the file path of this csv file with `get_qm9_filepath` method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dataset_filepath = datasets.get_qm9_filepath()\n",
    "\n",
    "print('dataset_filepath =', dataset_filepath)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dataset contains several chemical/physical properties. The labels of QM9 dataset can be checked by `get_qm9_label_names` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "QM9 label_names = ['A', 'B', 'C', 'mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve', 'U0', 'U', 'H', 'G', 'Cv']\n"
     ]
    }
   ],
   "source": [
    "label_names = datasets.get_qm9_label_names()\n",
    "print('QM9 label_names =', label_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More detail information is described in `readme.txt` of QM9 dataset, which can be downloaded from \n",
    " - [https://figshare.com/articles/Readme_file%3A_Data_description_for__Quantum_chemistry_structures_and_properties_of_134_kilo_molecules_/1057641](https://figshare.com/articles/Readme_file%3A_Data_description_for__Quantum_chemistry_structures_and_properties_of_134_kilo_molecules_/1057641)\n",
    "\n",
    "Below is the description of each property(label), written in readme.txt\n",
    "\n",
    "<a id='table1'></a>\n",
    "<blockquote cite=\"https://figshare.com/articles/Readme_file%3A_Data_description_for__Quantum_chemistry_structures_and_properties_of_134_kilo_molecules_/1057641\">\n",
    "<pre>\n",
    "I.  Property  Unit         Description\n",
    "--  --------  -----------  --------------\n",
    " 1  tag       -            \"gdb9\"; string constant to ease extraction via grep\n",
    " 2  index     -            Consecutive, 1-based integer identifier of molecule\n",
    " 3  A         GHz          Rotational constant A\n",
    " 4  B         GHz          Rotational constant B\n",
    " 5  C         GHz          Rotational constant C\n",
    " 6  mu        Debye        Dipole moment\n",
    " 7  alpha     Bohr^3       Isotropic polarizability\n",
    " 8  homo      Hartree      Energy of Highest occupied molecular orbital (HOMO)\n",
    " 9  lumo      Hartree      Energy of Lowest occupied molecular orbital (LUMO)\n",
    "10  gap       Hartree      Gap, difference between LUMO and HOMO\n",
    "11  r2        Bohr^2       Electronic spatial extent\n",
    "12  zpve      Hartree      Zero point vibrational energy\n",
    "13  U0        Hartree      Internal energy at 0 K\n",
    "14  U         Hartree      Internal energy at 298.15 K\n",
    "15  H         Hartree      Enthalpy at 298.15 K\n",
    "16  G         Hartree      Free energy at 298.15 K\n",
    "17  Cv        cal/(mol K)  Heat capacity at 298.15 K\n",
    "</pre>\n",
    "</blockquote>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Preprocessing dataset\n",
    "\n",
    "Dataset extraction depends on the preprocessing method, which is determined by `preprocessor`.\n",
    "\n",
    "Here, let's look an example of using `GGNNPreprocessor` preprocessor for QM9 dataset extraction.\n",
    "\n",
    "Procedure is as follows,\n",
    "\n",
    "1. Instantiate `preprocessor` (here `GGNNPreprocessor` is used).\n",
    "2. call `get_qm9` method with `preprocessor`.\n",
    " - `labels=None` option is used to extract all labels. In this case, 15 types of physical properties are extracted (see above).\n",
    "\n",
    "Note that `return_smiles` option can be used to get SMILES information together with the dataset itself."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|████████████████████████████████████████████████████████████████████████| 133885/133885 [01:35<00:00, 1406.47it/s]\n",
      "INFO:chainer_chemistry.dataset.parsers.csv_file_parser:Preprocess finished. FAIL 0, SUCCESS 133885, TOTAL 133885\n"
     ]
    }
   ],
   "source": [
    "from chainer_chemistry.dataset.preprocessors.ggnn_preprocessor import \\\n",
    "    GGNNPreprocessor\n",
    "    \n",
    "preprocessor = GGNNPreprocessor()\n",
    "dataset, dataset_smiles = datasets.get_qm9(preprocessor, labels=None, return_smiles=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check extracted dataset\n",
    "\n",
    "First, let's check type and number of dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dataset information...\n",
      "dataset <class 'chainer_chemistry.datasets.numpy_tuple_dataset.NumpyTupleDataset'> 133885\n",
      "smiles information...\n",
      "dataset_smiles <class 'numpy.ndarray'> 133885\n"
     ]
    }
   ],
   "source": [
    "print('dataset information...')\n",
    "print('dataset', type(dataset), len(dataset))\n",
    "\n",
    "print('smiles information...')\n",
    "print('dataset_smiles', type(dataset_smiles), len(dataset_smiles))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, QM9 dataset consists of 133885 data.\n",
    "\n",
    "The dataset is a class of `NumpyTupleDataset`, where i-th dataset features can be accessed by `dataset[i]`.\n",
    "\n",
    "When `GGNNPreprocessor` is used, each dataset consists of following features\n",
    " 1. atom feature: representing atomic number of given molecule. \n",
    " 2. adjacency matrix feature: representing adjacency matrix of given molecule.\n",
    "    `GGNNPreprocessor` extracts adjacency matrix of each bonding type.\n",
    " 3. label feature: representing chemical properties (label) of given molecule.\n",
    "    Please refer [above table](#table1) for details."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look an example of 7777-th dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "index=7777, SMILES=CC1=NCCC(C)O1\n",
      "atom (8,) [6 6 7 6 6 6 6 8]\n",
      "adj (4, 8, 8)\n",
      "adjacency matrix for SINGLE bond type\n",
      " [[0. 1. 0. 0. 0. 0. 0. 0.]\n",
      " [1. 0. 0. 0. 0. 0. 0. 1.]\n",
      " [0. 0. 0. 1. 0. 0. 0. 0.]\n",
      " [0. 0. 1. 0. 1. 0. 0. 0.]\n",
      " [0. 0. 0. 1. 0. 1. 0. 0.]\n",
      " [0. 0. 0. 0. 1. 0. 1. 1.]\n",
      " [0. 0. 0. 0. 0. 1. 0. 0.]\n",
      " [0. 1. 0. 0. 0. 1. 0. 0.]]\n",
      "adjacency matrix for DOUBLE bond type\n",
      " [[0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 1. 0. 0. 0. 0. 0.]\n",
      " [0. 1. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]]\n",
      "adjacency matrix for TRIPLE bond type\n",
      " [[0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]]\n",
      "adjacency matrix for AROMATIC bond type\n",
      " [[0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]\n",
      " [0. 0. 0. 0. 0. 0. 0. 0.]]\n",
      "labels [ 3.1431000e+00  1.8749400e+00  1.2443100e+00  1.9313999e+00\n",
      "  7.3379997e+01 -2.3750000e-01  3.4699999e-02  2.7219999e-01\n",
      "  1.0124120e+03  1.6597100e-01 -3.6510001e+02 -3.6509183e+02\n",
      " -3.6509088e+02 -3.6513235e+02  3.0584999e+01]\n"
     ]
    }
   ],
   "source": [
    "index = 7777\n",
    "\n",
    "print('index={}, SMILES={}'.format(index, dataset_smiles[index]))\n",
    "atom, adj, labels = dataset[index]\n",
    "# This molecule has N=8 atoms.\n",
    "print('atom', atom.shape, atom)\n",
    "# adjacency matrix is NxN matrix, where N is number of atoms in the molecule.\n",
    "# Unlike usual adjacency matrix, diagonal elements are filled with 1, for NFP calculation purpose.\n",
    "print('adj', adj.shape)\n",
    "print('adjacency matrix for SINGLE bond type\\n', adj[0])\n",
    "print('adjacency matrix for DOUBLE bond type\\n', adj[1])\n",
    "print('adjacency matrix for TRIPLE bond type\\n', adj[2])\n",
    "print('adjacency matrix for AROMATIC bond type\\n', adj[3])\n",
    "\n",
    "print('labels', labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualizing the molecule\n",
    "\n",
    "One might want to visualize molecule given SMILES information. Here is an example code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# This script is referred from http://rdkit.blogspot.jp/2015/02/new-drawing-code.html\n",
    "# and http://cheminformist.itmol.com/TEST/wp-content/uploads/2015/07/rdkit_moldraw2d_2.html\n",
    "from __future__ import print_function\n",
    "from rdkit import Chem\n",
    "from rdkit.Chem.Draw import IPythonConsole\n",
    "from IPython.display import SVG\n",
    "\n",
    "from rdkit.Chem import rdDepictor\n",
    "from rdkit.Chem.Draw import rdMolDraw2D\n",
    "def moltosvg(mol,molSize=(450,150),kekulize=True):\n",
    "    mc = Chem.Mol(mol.ToBinary())\n",
    "    if kekulize:\n",
    "        try:\n",
    "            Chem.Kekulize(mc)\n",
    "        except:\n",
    "            mc = Chem.Mol(mol.ToBinary())\n",
    "    if not mc.GetNumConformers():\n",
    "        rdDepictor.Compute2DCoords(mc)\n",
    "    drawer = rdMolDraw2D.MolDraw2DSVG(molSize[0],molSize[1])\n",
    "    drawer.DrawMolecule(mc)\n",
    "    drawer.FinishDrawing()\n",
    "    svg = drawer.GetDrawingText()\n",
    "    return svg\n",
    "\n",
    "def render_svg(svg):\n",
    "    # It seems that the svg renderer used doesn't quite hit the spec.\n",
    "    # Here are some fixes to make it work in the notebook, although I think\n",
    "    # the underlying issue needs to be resolved at the generation step\n",
    "    return SVG(svg.replace('svg:',''))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "smiles: CC1=NCCC(C)O1\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"150px\" version=\"1.1\" width=\"450px\" xml:space=\"preserve\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:svg=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<rect height=\"150\" style=\"opacity:1.0;fill:#FFFFFF;stroke:none\" width=\"450\" x=\"0\" y=\"0\"> </rect>\n",
       "<path d=\"M 106.906,143.182 165.953,109.091\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 165.953,109.091 165.953,78.75\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 165.953,78.75 165.953,48.4091\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 179.589,99.9886 179.589,78.75\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 179.589,78.75 179.589,57.5114\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 165.953,109.091 191.974,124.114\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 191.974,124.114 217.995,139.138\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 172.453,37.156 198.727,21.9871\" style=\"fill:none;fill-rule:evenodd;stroke:#0000FF;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 198.727,21.9871 225,6.81818\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 225,6.81818 284.047,40.9091\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 284.047,40.9091 284.047,109.091\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 284.047,109.091 343.094,143.182\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 284.047,109.091 258.026,124.114\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 258.026,124.114 232.005,139.138\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<text style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#0000FF\" x=\"159.452\" y=\"48.4091\"><tspan>N</tspan></text>\n",
       "<text style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#FF0000\" x=\"217.995\" y=\"150.682\"><tspan>O</tspan></text>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "smiles = dataset_smiles[index]\n",
    "mol = Chem.MolFromSmiles(dataset_smiles[index])\n",
    "\n",
    "print('smiles:', smiles)\n",
    "svg = moltosvg(mol)\n",
    "render_svg(svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Note] SVG images cannot be displayed on GitHub, but you can see an image of molecule when you execute it on jupyter notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Interactively watch through the QM9 dataset\n",
    "\n",
    "Jupyter notebook provides handy module to check/visualize the data. Here interact module can be used to interactively check the internal of QM9 dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "index=114829, SMILES=CCC1CC2OCC1O2\n",
      "atom [6 6 6 6 6 8 6 6 8]\n",
      "labels [   3.248    1.224    1.16     1.911   76.47    -0.249    0.082    0.331\n",
      " 1179.493    0.185 -424.24  -424.232 -424.231 -424.272   31.209]\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<svg baseProfile=\"full\" height=\"150px\" version=\"1.1\" width=\"450px\" xml:space=\"preserve\" xmlns:rdkit=\"http://www.rdkit.org/xml\" xmlns:svg=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<rect height=\"150\" style=\"opacity:1.0;fill:#FFFFFF;stroke:none\" width=\"450\" x=\"0\" y=\"0\"> </rect>\n",
       "<path d=\"M 400.396,89.0292 334.271,30.345\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 334.271,30.345 250.386,58.269\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 250.386,58.269 223.671,142.546\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 250.386,58.269 178.489,6.81818\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 223.671,142.546 135.263,143.182\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 135.263,143.182 102.817,119.963\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 102.817,119.963 70.3709,96.7438\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 135.263,143.182 167.336,119.526\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 167.336,119.526 199.408,95.8694\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 65.7435,84.231 77.9124,45.8424\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 77.9124,45.8424 90.0812,7.45367\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 90.0812,7.45367 178.489,6.81818\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 178.489,6.81818 191.203,45.0105\" style=\"fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<path d=\"M 191.203,45.0105 203.917,83.2028\" style=\"fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1\"/>\n",
       "<text style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#FF0000\" x=\"56.3613\" y=\"99.231\"><tspan>O</tspan></text>\n",
       "<text style=\"font-size:15px;font-style:normal;font-weight:normal;fill-opacity:1;stroke:none;font-family:sans-serif;text-anchor:start;fill:#FF0000\" x=\"199.408\" y=\"98.2028\"><tspan>O</tspan></text>\n",
       "</svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from ipywidgets import interact\n",
    "import numpy as np\n",
    "np.set_printoptions(precision=3, suppress=True)\n",
    "\n",
    "def show_dataset(index):\n",
    "    print('index={}, SMILES={}'.format(index, dataset_smiles[index]))\n",
    "    atom, adj, labels = dataset[index]\n",
    "    print('atom', atom)\n",
    "    # print('adj', adj)\n",
    "    print('labels', labels)\n",
    "    mol = Chem.MolFromSmiles(dataset_smiles[index])\n",
    "    return render_svg(moltosvg(mol))\n",
    "\n",
    "interact(show_dataset, index=(0, len(dataset) - 1, 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Appendix: how to save the molecule figure?\n",
    "\n",
    "\n",
    "### 1. Save with SVG format\n",
    "\n",
    "First method is simply save svg in file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "dirpath = 'images'\n",
    "\n",
    "if not os.path.exists(dirpath):\n",
    "    os.mkdir(dirpath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def save_svg(mol, filepath):\n",
    "    svg = moltosvg(mol)\n",
    "    with open(filepath, \"w\") as fw:\n",
    "        fw.write(svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "drawing images\\mol_7777.svg\n"
     ]
    }
   ],
   "source": [
    "index = 7777\n",
    "save_filepath = os.path.join(dirpath, 'mol_{}.svg'.format(index))\n",
    "print('drawing {}'.format(save_filepath))\n",
    "\n",
    "mol = Chem.MolFromSmiles(dataset_smiles[index])\n",
    "save_svg(mol, save_filepath)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Save with png format\n",
    "\n",
    "`rdkit` provides `Draw.MolToFile` method to visualize mol instance and save it to png format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from rdkit.Chem import Draw\n",
    "\n",
    "def save_png(mol, filepath, size=(600, 600)):\n",
    "    Draw.MolToFile(mol, filepath, size=size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "drawing images\\mol_7777.png\n"
     ]
    }
   ],
   "source": [
    "from rdkit.Chem import Draw\n",
    "index = 7777\n",
    "save_filepath = os.path.join(dirpath, 'mol_{}.png'.format(index))\n",
    "print('drawing {}'.format(save_filepath))\n",
    "\n",
    "mol = Chem.MolFromSmiles(dataset_smiles[index])\n",
    "save_png(mol, save_filepath, size=(600, 600))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [conda root]",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
